Kangle Li, kl66@iu.edu
Genevieve Mortensen, gamorten@iu.edu
Sean Dixit, sedixit@iu.edu
Gantt Diagram
In our project, we aim to develop a comprehensive system for Cats and Dogs detection (CaDoD) using various machine learning techniques. Our team will employ a phased approach starting from baseline models and gradually advancing to more sophisticated deep learning architectures. We will leverage both traditional machine learning algorithms liek logistic regression models using SKLearn and homegrown models, along with deep learning frameworks such as PyTorch. We plan on first creating a baseline model, and then hypertune it and compare results to see if it improves. Additionally, we will explore transfer learning using EfficientDet and SWIN transformers to enhance our model's performance. Through this project, we seek to not only accurately classify cats and dogs but also localize them within images by predicting bounding boxes. Through experimentation and iteration, we aim to deliver a robust, high-performing solution capable of accurately identifying and localizing cats and dogs in diverse real-world scenarios.
We will be using a subset of the Open Images Dataset V6. It is a large-scale dataset curated by Google designed to facilitate computer vision research and development. It contains millions of labeled images spanning a wide variety of categories, making it a valuable resource for training and evaluating machine learning models. Our subset will have 12,866 images of dogs and cats.
Why use logistic regression?
Logistic regression is easily interpretable, and it is effective for binary cassification. We can modify the loss function to combine Cross-Entropy (CXE) with Mean Squared Error (MSE) for a multitask learning approach. Vanilla logistic regression models are designed for classification. By extending the loss function to include MSE, we're adapting the model to perform classification and regression (regression being the bounding box coordinates prediction).
How to measure success?
Accuracy: For the classification part (cats vs dogs), accuracy measures the percentage of correctly classified instances.
Precision and Recall: Useful in scenarios with imbalanced datasets or when one class's false positives/negatives are more critical.
Mean Squared Error (MSE): MSE will measure the average squared difference between the estimated values and the actual value, providing insight into the precision of the bounding box predictions.
Intersection over Union (IoU): An additional metric for bounding box accuracy, IoU measures the overlap between the predicted bounding box and the actual bounding box, offering a direct indication of prediction accuracy in spatial terms.
Description of pipeline steps:
Data Collection & Preprocessing: Gather a dataset containing images of cats and dogs, annotated with class labels and bounding box coordinates. Preprocess this data for consistency, normalization, and augmentation to improve model robustness.
Feature Extraction: Transform raw image data into a suitable format for the logistic regression model, possibly using techniques like PCA for dimensionality reduction if dealing with raw pixel values.
Model Implementation: Develop the logistic regression model from scratch, extending the loss function to include both Cross-Entropy (for classification) and Mean Squared Error (for bounding box regression).
Training: Train the model on your prepared dataset, using the combined CXE + MSE loss to simultaneously learn classification and bounding box prediction.
Evaluation: Use your success metrics (Accuracy, Precision, Recall, MSE, IoU) to evaluate the model's performance on a separate test set.
Fine-tuning and Optimization: Adjust model parameters, learning rate, or preprocessing steps based on performance to improve outcomes.
The purpose of this project is create an end to end pipeline in machine learning to create an object detector for cats and dogs. There are about 13,000 images of varying shapes and aspect ratios. They are all RGB images and have bounding box coordinates stored in a .csv file. In order to create a detector, we will first have to preprocess the images to be all of the same shapes, take their RGB intensity values and flatten them from a 3D array to 2D. Then we will feed this array into a linear classifier and a linear regressor to predict labels and bounding boxes.
We will be using a subset of the Open Images Dataset V6. It is a large-scale dataset curated by Google designed to facilitate computer vision research and development. It contains millions of labeled images spanning a wide variety of categories, making it a valuable resource for training and evaluating machine learning models. Our subset will have 12,866 images of dogs and cats.
The image archive cadod.tar.gz is a subset Open Images V6. It contains a total of 12,966 images of dogs and cats.
Image bounding boxes are stored in the csv file cadod.csv. The following describes whats contained inside the csv.
The attributes have the following definitions:
from collections import Counter
import glob
import matplotlib.image as mpimg
import matplotlib.pyplot as plt
import numpy as np
import os
import pandas as pd
from PIL import Image
from sklearn.exceptions import ConvergenceWarning
from sklearn.linear_model import SGDClassifier, SGDRegressor
from sklearn.metrics import accuracy_score, mean_squared_error, roc_auc_score
from sklearn.model_selection import train_test_split
import tarfile
from tqdm.notebook import tqdm
import warnings
def extract_tar(file, path):
"""
function to extract tar.gz files to specified location
Args:
file (str): path where the file is located
path (str): path where you want to extract
"""
with tarfile.open(file) as tar:
files_extracted = 0
for member in tqdm(tar.getmembers()):
if os.path.isfile(path + member.name[1:]):
continue
else:
tar.extract(member, path)
files_extracted += 1
tar.close()
if files_extracted < 3:
print('Files already exist')
path = 'images/'
extract_tar('cadod.tar.gz', path)
HBox(children=(FloatProgress(value=0.0, max=25936.0), HTML(value='')))
Files already exist
!ls -l
total 14336 -rwxr-xr-x 1 root root 4634116 Apr 9 2021 CaDoD_Phase_1_baseline.ipynb -rwxr-xr-x 1 root root 4633634 Nov 12 22:35 CaDoD_Phase_1_baseline_SKLearn_homegrown.ipynb -rwxr-xr-x 1 root root 3300747 May 8 2021 CaDoD_Phase_2_PyTorch.ipynb drwxr-xr-x 4 root root 128 Nov 12 22:24 Phase_1_Digit_detector_MLP drwxr-xr-x 7 root root 224 Aug 5 23:14 Phase_2_cats_dogs_detector_Efficient_Det drwxr-xr-x 3 root root 96 Aug 15 04:10 Phase_3_Cats_and_Dogs_SSD_3_levels_of_detection
import pandas as pd
df = pd.read_csv('cadod.csv')
FileNotFoundErrorTraceback (most recent call last) <ipython-input-2-149864d7a26e> in <module> 1 import pandas as pd ----> 2 df = pd.read_csv('cadod.csv') /usr/local/lib/python3.7/site-packages/pandas/util/_decorators.py in wrapper(*args, **kwargs) 309 stacklevel=stacklevel, 310 ) --> 311 return func(*args, **kwargs) 312 313 return wrapper /usr/local/lib/python3.7/site-packages/pandas/io/parsers/readers.py in read_csv(filepath_or_buffer, sep, delimiter, header, names, index_col, usecols, squeeze, prefix, mangle_dupe_cols, dtype, engine, converters, true_values, false_values, skipinitialspace, skiprows, skipfooter, nrows, na_values, keep_default_na, na_filter, verbose, skip_blank_lines, parse_dates, infer_datetime_format, keep_date_col, date_parser, dayfirst, cache_dates, iterator, chunksize, compression, thousands, decimal, lineterminator, quotechar, quoting, doublequote, escapechar, comment, encoding, encoding_errors, dialect, error_bad_lines, warn_bad_lines, on_bad_lines, delim_whitespace, low_memory, memory_map, float_precision, storage_options) 584 kwds.update(kwds_defaults) 585 --> 586 return _read(filepath_or_buffer, kwds) 587 588 /usr/local/lib/python3.7/site-packages/pandas/io/parsers/readers.py in _read(filepath_or_buffer, kwds) 480 481 # Create the parser. --> 482 parser = TextFileReader(filepath_or_buffer, **kwds) 483 484 if chunksize or iterator: /usr/local/lib/python3.7/site-packages/pandas/io/parsers/readers.py in __init__(self, f, engine, **kwds) 809 self.options["has_index_names"] = kwds["has_index_names"] 810 --> 811 self._engine = self._make_engine(self.engine) 812 813 def close(self): /usr/local/lib/python3.7/site-packages/pandas/io/parsers/readers.py in _make_engine(self, engine) 1038 ) 1039 # error: Too many arguments for "ParserBase" -> 1040 return mapping[engine](self.f, **self.options) # type: ignore[call-arg] 1041 1042 def _failover_to_python(self): /usr/local/lib/python3.7/site-packages/pandas/io/parsers/c_parser_wrapper.py in __init__(self, src, **kwds) 49 50 # open handles ---> 51 self._open_handles(src, kwds) 52 assert self.handles is not None 53 /usr/local/lib/python3.7/site-packages/pandas/io/parsers/base_parser.py in _open_handles(self, src, kwds) 227 memory_map=kwds.get("memory_map", False), 228 storage_options=kwds.get("storage_options", None), --> 229 errors=kwds.get("encoding_errors", "strict"), 230 ) 231 /usr/local/lib/python3.7/site-packages/pandas/io/common.py in get_handle(path_or_buf, mode, encoding, compression, memory_map, is_text, errors, storage_options) 704 encoding=ioargs.encoding, 705 errors=errors, --> 706 newline="", 707 ) 708 else: FileNotFoundError: [Errno 2] No such file or directory: 'cadod.csv'
df.head()
| ImageID | Source | LabelName | Confidence | XMin | XMax | YMin | YMax | IsOccluded | IsTruncated | ... | IsInside | XClick1X | XClick2X | XClick3X | XClick4X | XClick1Y | XClick2Y | XClick3Y | XClick4Y | area | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 0000b9fcba019d36 | xclick | dog | 1 | 0.165000 | 0.903750 | 0.268333 | 0.998333 | 1 | 1 | ... | 0 | 0.636250 | 0.903750 | 0.748750 | 0.165000 | 0.268333 | 0.506667 | 0.998333 | 0.661667 | 0.539288 |
| 1 | 0000cb13febe0138 | xclick | dog | 1 | 0.000000 | 0.651875 | 0.000000 | 0.999062 | 1 | 1 | ... | 0 | 0.312500 | 0.000000 | 0.317500 | 0.651875 | 0.000000 | 0.410882 | 0.999062 | 0.999062 | 0.651264 |
| 2 | 0005a9520eb22c19 | xclick | dog | 1 | 0.094167 | 0.611667 | 0.055626 | 0.998736 | 1 | 1 | ... | 0 | 0.487500 | 0.611667 | 0.243333 | 0.094167 | 0.055626 | 0.226296 | 0.998736 | 0.305942 | 0.488059 |
| 3 | 0006303f02219b07 | xclick | dog | 1 | 0.000000 | 0.999219 | 0.000000 | 0.998824 | 1 | 1 | ... | 0 | 0.508594 | 0.999219 | 0.000000 | 0.478906 | 0.000000 | 0.375294 | 0.720000 | 0.998824 | 0.998044 |
| 4 | 00064d23bf997652 | xclick | dog | 1 | 0.240938 | 0.906183 | 0.000000 | 0.694286 | 0 | 0 | ... | 0 | 0.678038 | 0.906183 | 0.240938 | 0.522388 | 0.000000 | 0.370000 | 0.424286 | 0.694286 | 0.461870 |
5 rows × 22 columns
print(f"There are a total of {len(glob.glob1(path, '*.jpg'))} images")
There are a total of 12966 images
print(f"The total size is {os.path.getsize(path)/1000} MB")
The total size is 844.512 MB
df.shape
(12966, 22)
Replace LabelName with human readable labels
df.LabelName.replace({'/m/01yrx':'cat', '/m/0bt9lr':'dog'}, inplace=True)
df.LabelName.value_counts()
dog 6855 cat 6111 Name: LabelName, dtype: int64
df.LabelName.value_counts().plot(kind='bar')
plt.title('Image Class Count')
plt.show()
df.describe()
| Confidence | XMin | XMax | YMin | YMax | IsOccluded | IsTruncated | IsGroupOf | IsDepiction | IsInside | XClick1X | XClick2X | XClick3X | XClick4X | XClick1Y | XClick2Y | XClick3Y | XClick4Y | area | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| count | 12966.0 | 12966.000000 | 12966.000000 | 12966.000000 | 12966.000000 | 12966.000000 | 12966.000000 | 12966.000000 | 12966.000000 | 12966.000000 | 12966.000000 | 12966.000000 | 12966.000000 | 12966.000000 | 12966.000000 | 12966.000000 | 12966.000000 | 12966.000000 | 12966.000000 |
| mean | 1.0 | 0.099437 | 0.901750 | 0.088877 | 0.945022 | 0.464754 | 0.738470 | 0.013651 | 0.045427 | 0.001157 | 0.390356 | 0.424582 | 0.494143 | 0.506689 | 0.275434 | 0.447448 | 0.641749 | 0.582910 | 0.688754 |
| std | 0.0 | 0.113023 | 0.111468 | 0.097345 | 0.081500 | 0.499239 | 0.440011 | 0.118019 | 0.209354 | 0.040229 | 0.358313 | 0.441751 | 0.405033 | 0.462281 | 0.415511 | 0.401580 | 0.448054 | 0.403454 | 0.179648 |
| min | 1.0 | 0.000000 | 0.408125 | 0.000000 | 0.451389 | -1.000000 | -1.000000 | -1.000000 | -1.000000 | -1.000000 | -1.000000 | -1.000000 | -1.000000 | -1.000000 | -1.000000 | -1.000000 | -1.000000 | -1.000000 | 0.400178 |
| 25% | 1.0 | 0.000000 | 0.830625 | 0.000000 | 0.910000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.221293 | 0.096875 | 0.285071 | 0.130000 | 0.024323 | 0.218333 | 0.405817 | 0.400000 | 0.532997 |
| 50% | 1.0 | 0.061250 | 0.941682 | 0.059695 | 0.996875 | 0.000000 | 1.000000 | 0.000000 | 0.000000 | 0.000000 | 0.435625 | 0.415625 | 0.531919 | 0.623437 | 0.146319 | 0.480839 | 0.825000 | 0.646667 | 0.676201 |
| 75% | 1.0 | 0.167500 | 0.998889 | 0.144853 | 0.999062 | 1.000000 | 1.000000 | 0.000000 | 0.000000 | 0.000000 | 0.609995 | 0.820000 | 0.787500 | 0.917529 | 0.561323 | 0.729069 | 0.998042 | 0.882500 | 0.835382 |
| max | 1.0 | 0.592500 | 1.000000 | 0.587088 | 1.000000 | 1.000000 | 1.000000 | 1.000000 | 1.000000 | 1.000000 | 0.999375 | 0.999375 | 1.000000 | 0.999375 | 0.999375 | 0.999375 | 1.000000 | 0.999375 | 1.000000 |
# plot random 6 images
fig, ax = plt.subplots(nrows=2, ncols=3, sharex=False, sharey=False,figsize=(15,10))
ax = ax.flatten()
for i,j in enumerate(np.random.choice(df.shape[0], size=6, replace=False)):
img = mpimg.imread(path + df.ImageID.values[j] + '.jpg')
h, w = img.shape[:2]
coords = df.iloc[j,4:8]
ax[i].imshow(img)
ax[i].set_title(df.LabelName[j])
ax[i].add_patch(plt.Rectangle((coords[0]*w, coords[2]*h),
coords[1]*w-coords[0]*w, coords[3]*h-coords[2]*h,
edgecolor='red', facecolor='none'))
plt.tight_layout()
plt.show()
Go through all images and record the shape of the image in pixels and the memory size
img_shape = []
img_size = np.zeros((df.shape[0], 1))
for i,f in enumerate(tqdm(glob.glob1(path, '*.jpg'))):
file = path+'/'+f
img = Image.open(file)
img_shape.append(f"{img.size[0]}x{img.size[1]}")
img_size[i] += os.path.getsize(file)
HBox(children=(FloatProgress(value=0.0, max=12966.0), HTML(value='')))
Count all the different image shapes
img_shape_count = Counter(img_shape)
# create a dataframe for image shapes
img_df = pd.DataFrame(set(img_shape_count.items()), columns=['img_shape','img_count'])
img_df.shape
(594, 2)
There are a ton of different image shapes. Let's narrow this down by getting a sum of any image shape that has a cout less than 100 and put that in a category called other
img_df = img_df.append({'img_shape': 'other','img_count': img_df[img_df.img_count < 100].img_count.sum()},
ignore_index=True)
Drop all image shapes
img_df = img_df[img_df.img_count >= 100]
Check if the count sum matches the number of images
img_df.img_count.sum() == df.shape[0]
True
Plot
img_df.sort_values('img_count', inplace=True)
img_df.plot(x='img_shape', y='img_count', kind='barh', figsize=(8,8), legend=False)
plt.title('Image Shape Counts')
plt.show()
# convert to megabytes
img_size = img_size / 1000
fig, ax = plt.subplots(1, 2, figsize=(15,5))
fig.suptitle('Image Size Distribution')
ax[0].hist(img_size, bins=50)
ax[0].set_title('Histogram')
ax[0].set_xlabel('Image Size (MB)')
ax[1].boxplot(img_size, vert=False, widths=0.5)
ax[1].set_title('Boxplot')
ax[1].set_xlabel('Image Size (MB)')
ax[1].set_ylabel('Images')
plt.show()
!mkdir -p images/resized
%%time
# resize image and save, convert to numpy
img_arr = np.zeros((df.shape[0],128*128*3)) # initialize np.array
for i, f in enumerate(tqdm(df.ImageID)):
img = Image.open(path+f+'.jpg')
img_resized = img.resize((128,128))
img_resized.save("images/resized/"+f+'.jpg', "JPEG", optimize=True)
img_arr[i] = np.asarray(img_resized, dtype=np.uint8).flatten()
HBox(children=(FloatProgress(value=0.0, max=12966.0), HTML(value='')))
CPU times: user 1min 51s, sys: 10.3 s, total: 2min 1s Wall time: 3min 24s
Plot the resized and filtered images
# plot random 6 images
fig, ax = plt.subplots(nrows=2, ncols=3, sharex=False, sharey=False,figsize=(15,10))
ax = ax.flatten()
for i,j in enumerate(np.random.choice(df.shape[0], size=6, replace=False)):
img = mpimg.imread(path+'/resized/'+df.ImageID.values[j]+'.jpg')
h, w = img.shape[:2]
coords = df.iloc[j,4:8]
ax[i].imshow(img)
ax[i].set_title(df.iloc[j,2])
ax[i].add_patch(plt.Rectangle((coords[0]*w, coords[2]*h),
coords[1]*w-coords[0]*w, coords[3]*h-coords[2]*h,
edgecolor='red', facecolor='none'))
plt.tight_layout()
plt.show()
# encode labels
df['Label'] = (df.LabelName == 'dog').astype(np.uint8)
mkdir -p data
np.save('data/img.npy', img_arr.astype(np.uint8))
np.save('data/y_label.npy', df.Label.values)
np.save('data/y_bbox.npy', df[['XMin', 'YMin', 'XMax', 'YMax']].values.astype(np.float32))
X = np.load('data/img.npy', allow_pickle=True)
y_label = np.load('data/y_label.npy', allow_pickle=True)
y_bbox = np.load('data/y_bbox.npy', allow_pickle=True)
idx_to_label = {1:'dog', 0:'cat'} # encoder
Double check that it loaded correctly
# plot random 6 images
fig, ax = plt.subplots(nrows=2, ncols=3, sharex=False, sharey=False,figsize=(15,10))
ax = ax.flatten()
for i,j in enumerate(np.random.choice(X.shape[0], size=6, replace=False)):
coords = y_bbox[j] * 128
ax[i].imshow(X[j].reshape(128,128,3))
ax[i].set_title(idx_to_label[y_label[j]])
ax[i].add_patch(plt.Rectangle((coords[0], coords[1]),
coords[2]-coords[0], coords[3]-coords[1],
edgecolor='red', facecolor='none'))
plt.tight_layout()
plt.show()
Create training and testing sets
X_train, X_test, y_train, y_test_label = train_test_split(X, y_label, test_size=0.01, random_state=27)
I'm choosing SGDClassifier because the data is large and I want to be able to perform stochastic gradient descent and also its ability to early stop. With this many parameters, a model can easily overfit so it's important to try and find the point of where it begins to overfit and stop for optimal results.
%%time
model = SGDClassifier(loss='log', n_jobs=-1, random_state=27, learning_rate='adaptive', eta0=1e-10,
early_stopping=True, validation_fraction=0.1, n_iter_no_change=3)
# 0.2 validation TODO
model.fit(X_train, y_train)
CPU times: user 1min 10s, sys: 40.6 s, total: 1min 51s Wall time: 1min 40s
SGDClassifier(early_stopping=True, eta0=1e-10, learning_rate='adaptive',
loss='log', n_iter_no_change=3, n_jobs=-1, random_state=27)
model.n_iter_
4
Did it stop too early? Let's retrain with a few more iterations to see. Note that SGDClassifier has a parameter called validation_fraction which splits a validation set from the training data to determine when it stops.
X_train, X_valid, y_train, y_valid = train_test_split(X_train, y_train, test_size=0.1, random_state=27)
model2 = SGDClassifier(loss='log', n_jobs=-1, random_state=27, learning_rate='adaptive', eta0=1e-10)
epochs = 30
train_acc = np.zeros(epochs)
valid_acc = np.zeros(epochs)
for i in tqdm(range(epochs)):
model2.partial_fit(X_train, y_train, np.unique(y_train))
#log
train_acc[i] += np.round(accuracy_score(y_train, model2.predict(X_train)),3)
valid_acc[i] += np.round(accuracy_score(y_valid, model2.predict(X_valid)),3)
HBox(children=(FloatProgress(value=0.0, max=30.0), HTML(value='')))
plt.plot(train_acc, label='train')
plt.plot(valid_acc, label='valid')
plt.title('CaDoD Training')
plt.xlabel('Epochs')
plt.ylabel('Accuracy')
plt.legend()
plt.show()
del model2
expLog = pd.DataFrame(columns=["exp_name",
"Train Acc",
"Valid Acc",
"Test Acc",
"Train MSE",
"Valid MSE",
"Test MSE",
])
exp_name = f"Baseline: Linear Model"
expLog.loc[0,:4] = [f"{exp_name}"] + list(np.round(
[accuracy_score(y_train, model.predict(X_train)),
accuracy_score(y_valid, model.predict(X_valid)),
accuracy_score(y_test_label, model.predict(X_test))],3))
/usr/local/lib/python3.6/site-packages/ipykernel_launcher.py:5: FutureWarning: Slicing a positional slice with .loc is not supported, and will raise TypeError in a future version. Use .loc with labels or .iloc with positions instead. """
expLog
| exp_name | Train Acc | Valid Acc | Test Acc | Train MSE | Valid MSE | Test MSE | |
|---|---|---|---|---|---|---|---|
| 0 | Baseline: SGDClassifier | 0.584 | 0.574 | 0.554 | NaN | NaN | NaN |
y_pred_label = model.predict(X_test)
y_pred_label_proba = model.predict_proba(X_test)
fig, ax = plt.subplots(nrows=2, ncols=5, sharex=False, sharey=False,figsize=(15,6))
ax = ax.flatten()
for i in range(10):
img = X_test[i].reshape(128,128,3)
ax[i].imshow(img)
ax[i].set_title("Ground Truth: {0} \n Prediction: {1} | {2:.2f}".format(idx_to_label[y_test_label[i]],
idx_to_label[y_pred_label[i]],
y_pred_label_proba[i][y_pred_label[i]]),
color=("green" if y_pred_label[i]==y_test_label[i] else "red"))
plt.tight_layout()
plt.show()
Train a linear regression model on multiple target values $[y_1, y_2, y_3, y_4]$ corresponding to [x, y, w, h] of the bounding box containing the object of interest. For more details see SKLearn's manpage on LinearRegression
### Split data
X_train, X_test, y_train, y_test = train_test_split(X, y_bbox, test_size=0.01, random_state=27)
X_train, X_valid, y_train, y_valid = train_test_split(X_train, y_train, test_size=0.1, random_state=27)
%%time
from sklearn.linear_model import LinearRegression
# TODO closed loop solution, could use Lasso Ridge
model = ..... #fill in
model.fit(X_train, y_train)
# might take a few minutes to train
#CPU times: user 1h 26min 40s, sys: 5min 53s, total: 1h 32min 34s
#Wall time: 17min 24s
CPU times: user 1h 26min 40s, sys: 5min 53s, total: 1h 32min 34s Wall time: 17min 24s
LinearRegression(n_jobs=-1)
expLog.iloc[0,4:] = list(np.round([mean_squared_error(y_train, model.predict(X_train)),
mean_squared_error(y_valid, model.predict(X_valid)),
mean_squared_error(y_test, model.predict(X_test))],3))
expLog
| exp_name | Train Acc | Valid Acc | Test Acc | Train MSE | Valid MSE | Test MSE | |
|---|---|---|---|---|---|---|---|
| 0 | Baseline: Linear Model | 0.584 | 0.574 | 0.554 | 0 | 0.036 | 0.035 |
y_pred_bbox = model.predict(X_test)
fig, ax = plt.subplots(nrows=2, ncols=3, sharex=False, sharey=False,figsize=(15,10))
ax = ax.flatten()
for i,j in enumerate(np.random.choice(X_test.shape[0], size=6, replace=False)):
img = X_test[j].reshape(128,128,3)
coords = y_pred_bbox[j] * 128
ax[i].imshow(img)
ax[i].set_title("Ground Truth: {0} \n Prediction: {1} | {2:.2f}".format(idx_to_label[y_test_label[j]],
idx_to_label[y_pred_label[j]],
y_pred_label_proba[j][y_pred_label[j]]),
color=("green" if y_pred_label[j]==y_test_label[j] else "red"))
ax[i].add_patch(plt.Rectangle((coords[0], coords[1]),
coords[2]-coords[0], coords[3]-coords[1],
edgecolor='red', facecolor='none'))
plt.tight_layout()
plt.show()
Implement a Homegrown Logistic Regression model. Extend the loss function from CXE to CXE + MSE, i.e., make it a complex multitask loss function where the resulting model predicts the class and bounding box coordinates at the same time.
Why use logistic regression?
Logistic regression is easily interpretable, and it is effective for binary cassification. We can modify the loss function to combine Cross-Entropy (CXE) with Mean Squared Error (MSE) for a multitask learning approach. Vanilla logistic regression models are designed for classification. By extending the loss function to include MSE, we're adapting the model to perform classification and regression (regression being the bounding box coordinates prediction).
How to measure success?
Accuracy: For the classification part (cats vs dogs), accuracy measures the percentage of correctly classified instances.
Precision and Recall: Useful in scenarios with imbalanced datasets or when one class's false positives/negatives are more critical.
Mean Squared Error (MSE): MSE will measure the average squared difference between the estimated values and the actual value, providing insight into the precision of the bounding box predictions.
Intersection over Union (IoU): An additional metric for bounding box accuracy, IoU measures the overlap between the predicted bounding box and the actual bounding box, offering a direct indication of prediction accuracy in spatial terms.
Description of pipeline steps:
Data Collection & Preprocessing: Gather a dataset containing images of cats and dogs, annotated with class labels and bounding box coordinates. Preprocess this data for consistency, normalization, and augmentation to improve model robustness.
Feature Extraction: Transform raw image data into a suitable format for the logistic regression model, possibly using techniques like PCA for dimensionality reduction if dealing with raw pixel values.
Model Implementation: Develop the logistic regression model from scratch, extending the loss function to include both Cross-Entropy (for classification) and Mean Squared Error (for bounding box regression).
Training: Train the model on your prepared dataset, using the combined CXE + MSE loss to simultaneously learn classification and bounding box prediction.
Evaluation: Use your success metrics (Accuracy, Precision, Recall, MSE, IoU) to evaluate the model's performance on a separate test set.
Fine-tuning and Optimization: Adjust model parameters, learning rate, or preprocessing steps based on performance to improve outcomes.